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Note: This is a working paper which will be expanded/updated frequently. The directory 
deleeuwpdx.net/pubfolders/secstress has a pdf copy of this article, the complete Rmd file 
that includes all code chunks, and R files with the code. Suggestions are welcome 24/7. 

1 Problem 

Define the multidimensional scaling (MDS) loss function 


cr r {x) = ^2 w i(Si - ( x'Aix ) r ) 2 , ( 1 ) 

i= 1 

with r > 0 and the positive semi-definite. The Wi are positive weights, the Si are non¬ 
negative dissimilarities. We call this rStress (De Leeuw, Groenen, and Mair (2016)). Special 
cases are stress (Kruskal 1964) for r = |, sstress (Takane, Young, and De Leeuw 1977) for 
r — 1, and the loss function used in MLILTISCALE (Ramsay 1977) for r —> 0. 
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In this paper we are interested in the first and second derivatives of rStress, and in the various 
applications of these derivatives to the problem of minimizing rStress. 

2 Derivatives 

Compact expression for the first and second derivatives of a r can be given by defining the 
matrices 


We then have 


= J2wi5i(x'Aix) r 1 A i , 
1=1 
n 

= J2 w i( x 'A i x) 2r - 1 A i , 


B r (x) : 

C r (x) : 

S r (x) := '^2,w i 8i( x 'Ai 
1=1 
n 

T r (x) := J2 w i(x'Aix] 


i= 1 
n 


' A x) r -t 


2r—l 


Ai + 2(r — l)—j 


AjXx'Aj 


ry*f /\ . /y» 

t L/ ill <1/ 


i= 1 


Ai + 2(2r — 1 )—— 


AjXx'Aj, 


rv*t /j . /y* 
ih ill ih 


( 2 ) 

( 3 ) 

( 4 ) 

( 5 ) 


Va r (x) = —4 r{B r (x) — C r (a;)}a;, 


( 6 ) 


and 

V 2 cr r (x) = —4:r{S r (x) — T r (x)}. (7) 

Note that S r (x) is positive semi-definite for r > \ and T r (x) is positive-semi-definite for 
r > 

We have written the R function mdsDerivatives () to evaluate the gradient and Hessian. Just 
to make sure our formulas are correct, the code can optionally compute numerical derivatives 
using the numDeriv package of Gilbert and Varadhan (2014). 


3 Newton’s Method 

Newton’s method to minimize rStress is 


^(fc+i) = x (k) _ [ Sr ( x W) - T r (x^)]-\B r (x^) - C r (xW))xW. (8) 

As we can expect in highly nonlinear situations like MDS, Newton’s method without safeguards 
sometimes works and sometimes doesn’t. If it works, it is generally fast, which is of some 
interest at least because the majorization method developed in De Leeuw, Groenen, and Mair 
(2016) for minimizing rStress can be very slow, especially for r > 
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4 Dutch Political Parties 


Our main example in the paper is are the dissimilarity measures for nine Dutch political 
parties, collected by De Gruijter (1967). 

## KVP PvdA VVD ARP CHU CPN PSP BP 

## PvdA 5.63 

## VVD 5.27 6.72 

## ARP 4.60 5.64 5.46 

## CHU 4.80 6.22 4.97 3.20 

## CPN 7.54 5.12 8.13 7.84 7.80 

## PSP 6.73 4.59 7.55 6.73 7.08 4.08 

## BP 7.18 7.22 6.90 7.28 6.96 6.34 6.88 

## D66 6.17 5.47 4.67 6.13 6.04 7.42 6.36 7.36 


Newton’s method converges in all cases, although it often behaves very erratically in the 
early iterations. Table 1 shows the number of iterations, the rStress value, the maximum 
norm of the gradient, and the smallest eigenvalue of the Hessian at the solution. 


## 

r: 

0.40 

iters: 

60 

rStress: 

0.07488639 

maxGrad: 

0.00000000 

minHess: 

-23.37 

## 

r: 

0.45 

iters: 

13 

rStress: 

0.06911461 

maxGrad: 

0.00000000 

minHess: 

-7.682 

## 

r: 

0.50 

iters: 

48 

rStress: 

0.15944252 

maxGrad: 

0.00000003 

minHess: 

-12.72 

## 

r: 

0.55 

iters: 

58 

rStress: 

0.07685229 

maxGrad: 

0.00000000 

minHess: 

-4.271 

## 

r: 

0.65 

iters: 

55 

rStress: 

0.07731578 

maxGrad: 

0.00000000 

minHess: 

-o.ooc 

## 

r: 

0.75 

iters: 

9 

rStress: 

0.14507211 

maxGrad: 

0.00000000 

minHess: 

-3.125 

## 

r: 

0.90 

iters: 

60 

rStress: 

0.13133901 

maxGrad: 

0.00000000 

minHess: 

-O.OOC 

## 

r: 

1.00 

iters: 

26 

rStress: 

0.14925820 

maxGrad: 

0.00000000 

minHess: 

-O.OOC 

## 

r: 

2.00 

iters: 

47 

rStress: 

0.35796584 

maxGrad: 

0.00000000 

minHess: 

-O.OOC 


Table 1: Newton solutions with various r 

Clearly for the majority of solutions Newton stops at a saddle point, or at least a flat spot 
fairly close to a local minimum. Only for large values of r do we find a proper local minimum. 
For values of r less than .40 we cannot get Newton to work. It rapidly diverges into regions 
with very large values of both x and rStress. The configurations in figure 1 also seem to differ 
quite a bit for smaller values of r. Note the increased clustering for increasing r, until finally 
for r — 2 parties are put in the edges of an equilateral triangle. 
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Newton for r = 0.4 
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Figure 1: Newton configurations with various r 
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5 Majorizing Newton 

In this section we limit ourselves to the case r > |. Without loss of generality we assume the 
dissimilarities are scaled by 

n 

J>A 2 = 1. (9) 

i=l 

Next, it is convenient to define 
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( 10 ) 


Pr{x) - WiSi(x'AiX) r , 
i= 1 

and 

n 

Vr( x ) ■= J2 W i( X>A i X ) 2r ’ ( X1 ) 

1=1 

so that 

a r (x) = 1 - 2 p r {x) + rj r (x). (12) 

If r > 7 , then both p r and p r are convex (De Leeuw, Groenen, and Mair (2016)). Thus 

Pr{x) > p r (y) + (x- y)'Vp(y), (13) 

for all x and y , which translates to the majorization 

cr r (x) = nunC(a:,y), (14) 

where 

C( x , y) ■= 1 - 2 pr(y) - 4 r(x - y)'B r (y)y + p r (x). (15) 

Now consider the algorithm where we use block relaxation to alternate over minimization 
over ( over x and y. By definition 

argmin ((x, y) — x, (16) 

so minimization over y for given x is trivial. We minimize ( over x for given y by using 
one or more steps of Newton’s method, relying on the fact that ( is convex in x for given 
y. Thus there will be no local minima problem with Newton, although we may observe 
non-convergence. Note that it will not be neceesary for convergence to iterate Newton to 
convergence between updates of y. In fact we propose an algorithm in which only a single 
Newton step is done. 

The derivatives needed for the Newton steps are 

V) = -4 r(B r (y)y - C r (x)x), (17) 

and 
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T>n((x,y) = 4r T,.(x). 


(18) 


Thus the two-block algorithm with a single Newton step becomes 


y w = x (k \ (19) 

x (k+i) = x (k) _ [T r (x w )]~\B r (y^)y^ - C r {x^ k) )x w ), ( 20 ) 

but this is of course equivalent to the algorithm 

x (k+i) = x (k) _ [ Tr (x^)]~\B r (x^) - C r (x^))x {k \ (21) 

This is what we have implemented in our R program, using the parameter linearize=TRUE. 
By default linearize=FALSE, which is the standard uncorrected Newton method. 

The idea is to give up some speed (and quadratic convergence) by gaining stability. In table 
2 we do see larger numbers of iterations (but iterations are marginally faster because they do 
not need S r (x)). We also have observed monotone convergence of loss function values in all 
cases, and we see that convergence is always to a local minimum. In most cases, except for 
r — 1, the solution found has a lower loss function value than the one found by the Newton 
method. Remember, however, that our majorization method is only guaranteed to work for 


## 

r: 

0.40 

iters: 

288 

rStress: 

0.02854517 

maxGrad: 

0.00000011 

minHess: 

-o.ooc 

## 

r: 

0.45 

iters: 

268 

rStress: 

0.03823655 

maxGrad: 

0.00000009 

minHess: 

-0.000 

## 

r: 

0.50 

iters: 

729 

rStress: 

0.04460338 

maxGrad: 

0.00000011 

minHess: 

-O.OOC 

## 

r: 

0.55 

iters: 

186 

rStress: 

0.05524495 

maxGrad: 

0.00000009 

minHess: 

-0.000 

## 

r: 

0.65 

iters: 

104 

rStress: 

0.07731578 

maxGrad: 

0.00000006 

minHess: 

-O.OOC 

## 

r: 

0.75 

iters: 

96 

rStress: 

0.10711307 

maxGrad: 

0.00000006 

minHess: 

-O.OOC 

## 

r: 

0.90 

iters: 

150 

rStress: 

0.13989729 

maxGrad: 

0.00000005 

minHess: 

-0.000 

## 

r: 

1.00 

iters: 

1000 

rStress: 

0.15444014 

maxGrad: 

0.00000008 

minHess: 

-O.OOC 

## 

r : 

2.00 

iters: 

53 

rStress: 

0.23176557 

maxGrad: 

0.00000005 

minHess: 

-o.ooc 


Table 2: Majorization solutions with various r 

The configurations found by the majorization method are more stable over different values 
of r, and show the familar effect of becoming more and more clustered if r increases. Note 
that for r — 2 the majorization method finds a better location of the parties to the edges, 
although finding the optimum allocation is of course a combinatorial problem. 
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Newton for r = 0.4 
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Figure 2: Majorization with various r 
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In table 3 we give the rStress values and iteration numbers for the scalar majorization 
algorithm of De Leeuw, Groenen, and Mair (2016). We see that the rStress values for r > | 
are basically the same as the ones from the majorization algorithm in this paper, but the 
number of iterations is much larger. In fact, it is larger than 100,000 for r = 1 and r = 2. 


## 

r: 

0.10 

iters: 

29103 

rStress: 

0.00546400 

## 

r: 

0.25 

iters: 

3605 

rStress: 

0.00631000 

## 

r: 

0.50 

iters: 

3566 

rStress: 

0.04460300 

## 

r: 

0.75 

iters: 

3440 

rStress: 

0.10711300 

## 

r: 

1.00 

iters: 

NA 

rStress: 

0.15539200 

## 

r: 

2.00 

iters: 

NA 

rStress: 

0.23487700 


7 











Tabic 3: Scalar majorization solutions with various r 


6 Sensitivity Analysis 

The second derivatives can also be used to draw sensitivity regions around points in an 
MDS solution. At a point x where the first derivatives vanish and the Hessian is positive 
semi-definite, we have 


a r {y) « a r (x) + ^(x - y)'V 2 a r (x)(x - y ), (22) 

and thus {y \ a r (y) < a} is approximately the ellipsoid 

{y I - y)'T> 2 a r (x)(x - y) < 2(a - a r (x))}. (23) 

For graphics in the plane we take 2x2 principal submatrices of the Hessian and draw ellipses, 
for example by using the R package car (Fox and Weisberg (2011)). We have to remember 
that in car the shape matrix is the inverse of our second derivative matrix, while their radius 
parameter corresponds with our \j2(a — a r (x)). 

We illustrate this with the majorization solution for r = |, which has rStress 0.0446034. In 
figure 3 we choose a — a r = .001, which means we look for the solutions which have rStress 
larger than 0.0446034 by 0.001 or less. 

## Loading required package: carData 



Figure 3: Sensitivity regions for r = 0.5 




7 Nonmetric MDS 


Our main function newtonMeO has parameter nonmetric, by default FALSE, and ties, by 
default "primary". It uses the algorithm from De Leeuw (2016) to perform a monotonic 
regression after updating the configuration. We start with three runs for r = The first is 
Newton, the second Newton with majorization, and the third non-metric majorized Newton. 
Since the data do not have many ties (in fact just one) there is no opportunity to compare 
primary, secondary, and tertiary. 

For the number of iterations in the three runs we find 

## [1] 42 729 489 

and for rStress 


## [1] 0.099970774 0.044603383 0.008436025 


If we compare the configurations in figure 4 we see how the non-metric solution is less 
fine-grained than the metric one, although of course the fit is vastly improved. 
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Figure 4: Three solutions for r = 1/2 

The Shepard diagram in figure 5 shows the optimal non-metric transformation of the data. 
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Figure 5: Shepard diagram for non-metric solution 

A somewhat more elaborate example uses the Ekman (1954) color data. These have also 
been analyzed with various values of r in De Leeuw, Groenen, and Mair (2016). The data are 
well known for their excellent fit and for the very regular circular pattern in the recovered 
configurations. The data have quite a few ties, the 91 dissimilarities have only 47 unique 
values. 
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non-metric majorized Newton with both primary and secondary approach to ties. This gives 
a total of 8 analyses. 

Let’s look at the results for r = \ first. Both Newton and Majorized Newton converge to 
the same solution. The two non-metric solutions have much lower rStress, and take more 
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iterations to converge. But the configurations in figure 6 show all four configurations are 
basically the same. 


## Newton: 

## Majorized Newton: 

## Non-metric Majorized 
## Non-metric Majorized 


Newton (Primary): 
Newton (Secondary): 


iterations: 

7 

iterations: 

47 

iterations: 

191 

iterations: 

115 


rStress: 0.01721325 
rStress: 0.01721325 
rStress: 0.00053373 
rStress: 0.00099767 


Table 4: Ekrnan Solutions with r 
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Figure 6: Four Ekrnan solutions for r = 1/2 

Shepard plots for the primary and secondary approach to ties are tight and slightly convex. 
Again, they differ only in detail. 
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Figure 7: Ekman Shepard Plots r = 1/2 

For r = 1 Newton converges to a local maximum, with all points in the origin. The other 
three solutions in figure 8 are basically the same. They do not differ much from the plots for 
r — ^ • maybe exhibit a bit more clustering. The Shepard plots in figure 9 are cnsiderably 
more convex and non-linear than the ones for r = |, again indicating clustering (small 
dissimilarities become smaller after transformation, larger ones become larger). 


## 

NA 

iterations: 4 

rStress: 

1.00000000 

## 

NA 

iterations: 65 

rStress: 

0.09306315 

## 

NA 

iterations: 281 

rStress: 

0.00090145 

## 

NA 

iterations: 139 

rStress: 

0.00238525 

Table 5: 

Ekman Solutions with 

r = 1 
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Figure 8: Four Ekman solutions for r = 1 
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8 Code 


library (numDeriv) 
library (MASS) 

amalgm <- function (x, w = rep (1, length (x))) { 
dyn.load ("pava.so") 
n <- length (x) 
a <- rep (0, n) 
b <- rep (0, n) 
y <- rep (0, n) 

If <- 

.Fortran ( 

"AMALGM", 

n = as.integer (n), 
x = as.double (x), 
w = as.double (w), 
a = as.double (a), 
b = as.double (b), 
y = as.double (y), 
tol = as.double (le-15) , 
ifault = as.integer (0) 

) 

return (lf$y) 

} 

isotone <- 
function (x, 

y> 

w = rep (1, length (x)), 
ties = "secondary") { 
f <- sort(unique (x)) 
g <- lapply(f, function (z) 
which(x == z)) 
n <- length (x) 
k <- length (f) 
if (ties == "secondary") { 
w <- sapply (g, length) 
h <- lapply (g, function (x) 
yM) 

m <- sapply (h, sum) / w 
r <- amalgm (m, w) 
s <- rep (0, n) 
for (i in 1: k) 
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s [g [ [i] ] ] <- r [i] 

> 

if (ties == "primary") { 

h <- lapply (g, function (x) 
yM) 

m <- rep (0, n) 
for (i in 1: k) { 

ii <- order (h[[i]]) 

g[ [i] ] <- gC [i] 3 [ii] 

h[ [i] ] <- h[[i]] [ii] 

} 

m <- unlist (h) 

r <- amalgm (m, w) 

s <- r [order (unlist (g))] 

> 

if (ties == "tertiary") { 
w <- sapply (g, length) 
h <- lapply (g, function (x) 

yW) 

m <- sapply (h, sum) / w 
r <- amalgm (m, w) 
s <- rep (0, n) 
for (i in l:k) 

s[g[[i]]] <- y[g[[i]]] + (r[i] - m[i]) 

} 

return (s) 

} 


rStress <- function (x, w, delta, a, r) { 
n <- length (a) 
s <- 0 

for (i in l:n) { 

xax <- sum (x * (a[[i]] °/ 0 *7 0 x)) 
s <- s + w[i] * (delta[i] - xax r) ~ 2 

> 

return (s) 

} 

mdsDerivatives <- function (x, w, delta, a, r, numerical = FALSE) { 
m <- length (x) 
n <- length (a) 

b <- c <- s <- t <- matrix (0, m, m) 
for (i in l:n) { 
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xa <- drop (a[[i]] 1*1 x) 
xax <- sum (x * xa) 

b <- b + w[i] * delta [i] * (xax ~ (r - 1)) * a[[i]] 

c <- c + w[i] * (xax ~ (2 * r - 1)) * a[[i]] 

s <- 

s + w[i] * delta[i] * (xax ~ (r - 1)) * (a[[i]] + 2 * (r - 1) * outer(xa, xa) / x 
t <- 

t + w[i] * (xax ~ (2 * r - 1)) * (a[[i]j +2*(2*r-l)* outer(xa, xa) / xax 

} 

gan < - 4 * r * drop ((b - c) 1*1 x) 

han <- -4 * r * (s - t) 

result <- list ( 
b = b, 
c = c, 
s = s, 
t = t, 
gan = gan, 
han = han 

) 

if (numerical) { 
gnu <- grad ( 
rStress, 
x, 

w = w, 

delta = delta, 
a = a, 
r = r 

) 

hnu <- hessian ( 
rStress, 
x, 

w = w, 

delta = delta, 
a = a, 
r = r 

) 

result <- c (result, list (gnu = gnu, hnu = hnu)) 

} 

return (result) 


torgerson <- function(delta, p = 2) { 
doubleCenter <- function(x) { 
n <- dim(x) [1] 
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m <- dim(x)[2] 

s <- sum(x) / (n * m) 

xr <- rowSums(x) / m 

xc <- colSums(x) / n 

return((x - outer(xr, xc, "+")) + s) 

} 

z <- 

eigen(-doubleCenter ( (as.matrix (delta) 2) / 2) , symmetric = TRUE) 
v <- pmax(z$values, 0) 

return(z$vectors[, 1:p] diag(sqrt (v[1:p]))) 

} 

u <- function (i, n) { 

return (ifelse (i == l:n, 1, 0)) 

} 

e <- function (i, j, n) { 
d <- u (i, n) - u ( j , n) 
return (outer (d, d)) 

} 

directSum <- function (x) { 
m <- length (x) 
nr <- sum (sapply (x, nrow)) 
nc <- sum (sapply (x, ncol)) 
z <- matrix (0, nr, nc) 
kr <- 0 
kc <- 0 

for (i in l:m) { 
ir <- nrow (x [ [i]]) 
ic <- ncol (x[[i]]) 

z [kr + (l:ir), kc + ( 1: ic)] <- x[[i]] 
kr <- kr + ir 
kc <- kc + ic 

} 

return (z) 

} 

repList <- function(x, n) { 
z <- list() 
for (i in l:n) 

z <- c(z, list(x)) 
return(z) 


17 


makeA <- function (n, p = 2) { 
m <- n * (n - 1) / 2 
a <- list() 
for (j in 1: (n - 1) ) 
for (i in (j + 1) :n) { 
d <- u (i, n) - u (j , n) 
e <- outer (d, d) 

a <- c(a, list (directSum (repList (e, p)))) 

> 

return (a) 

} 

newtonMe <- 

function (delta, 

xini = NULL, 

w = rep (1, length (delta)), 

p = 2, 

r = .5, 
eps = le-15, 
itmax = 1000, 
linearize = FALSE, 
nonmetric = FALSE, 
ties = "primary", 
verbose = TRUE) { 
n <- nrow (as.matrix (delta)) 
dhat <- delta / sqrt (sum (delta 2)) 
if (is.null (xini)) { 

xold <- as.vector (torgerson (dhat, p)) 

} else { 

xold <- xini 

> 

a <- makeA (n, p) 
dold <- sapply (a, function (u) 
sum (xold * (u 1*1 xold))) 
eold <- dold ~ r 

sold <- sum (w * (dhat - eold) ~ 2) 
itel <- 1 
repeat { 

h <- mdsDerivatives (xold, w, dhat, a, r) 
if (linearize) { 

xnew <- drop (xold - ginv (4 * r * h$t) 1*1 h$gan) 
} else { 

xnew <- drop (xold - ginv (h$han) 1*1 h$gan) 

} 
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dnew <- sapply (a, function (u) 
sum (xnew * (u °/ 0 * 0 / 0 xnew))) 
enew <- dnew ~ r 
if (nonmetric) { 

dhat <- isotone (delta, enew, ties = ties) 
dhat <- dhat / sqrt (sum (dhat 2)) 

} 

snew <- sum (w * (dhat - enew) ~ 2) 
if (verbose) { 
cat ( 

formate (itel, width = 4, format = "d"), 
formate ( 
sold, 

digits = 10, 
width = 13, 
format = "f" 

), 

formate ( 
snew, 

digits = 10, 
width = 13, 
format = "f" 

), 

"\n" 

) 

} 

if ((itel == itmax) II (abs(sold - snew) < eps)) 
break 

itel <- itel + 1 
xold <- xnew 
dold <- dnew 
sold <- snew 

> 

return (list ( 

x = matrix (xnew, n, p), 

d = dnew, 

dhat = dhat, 

rstress = snew, 

g = h$gan, 

h = h$han, 

itel = itel 

)) 
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9 NEWS 


001 01/17/16 
002 01/17/16 
003 01/17/16 
004 01/18/16 
005 01/18/16 
006 01/18/16 
007 01/18/16 
008 01/19/16 
009 01/20/16 


Sadly incomplete version 

Added Newton with Majorization 

Added addional runs, many edits 

Figures redone, tables redone, discussion added 

Compare with older algorithm 

Add sensitivity regions 

Numerous small corrections 

Added nonmetric options 

Added Ekman example 
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